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Abstract. In predicate abstraction, exact image computation is problematic, requiring 
in the worst case an exponential number of calls to a decision procedure. For this reason, 
software model checkers typically use a weak approximation of the image. This can result 
in a failure to prove a property, even given an adequate set of predicates. We present 
an interpolant-based method for strengthening the abstract transition relation in case of 
such failures. This approach guarantees convergence given an adequate set of predicates, 
without requiring an exact image computation. We show empirically that the method 
converges more rapidly than an earlier method based on counterexample analysis. 



Predicate abstraction |GS97j is a technique commonly used in software model checking 
in which an infinite-state system is represented abstractly by a finite-state system whose 
states are the truth valuations of a chosen set of predicates. The reachable state set of the 
abstract system corresponds to the strongest inductive invariant of the infinite-state system 
expressible as a Boolean combination of the given predicates. 

The primary computational difficulty of predicate abstraction is the abstract image 
computation. That is, given a set of predicate states (perhaps represented symbolically) we 
wish to compute the set of predicate states reachable from this set in one step of the abstract 
system. This can be done by enumerating the predicate states, using a suitable decision 
procedure to determine whether each state is reachable in one step. However, since the 
number of decision procedure calls is exponential in the number of predicates, this approach 
is practical only for small predicates sets. For this reason, software model checkers, such 
as Slam [BR02j and Blast [HJMS02] typically use weak approximations of the abstract 
image. For example, the Cartesian image approximation is the strongest cube over the 
predicates that is implied at the next time. This approximation loses all information about 
predicates that are neither deterministically true nor deterministically false at the next time. 
Perhaps surprisingly, some properties of large programs, such as operating system device 

2000 ACM Subject Classification: D.2.4, F.3.1, 

Key words and phrases: Software Model Checking, Predicate Abstraction Interpolation. 
* A preliminary version of this paper appeared in the proceedings of CAV 2005. 



RANJIT JHALA a AND KENNETH L. MCMILLAN 6 



1. Introduction 




DOI:1 0.21 68/LMCS-3 (4:1 ) 2007 



© R. Jhala and K. L. McMillan 
@ Creative Commons 



2 



R. JHALA AND K. L. MCMILLAN 



drivers, can be verified with this weak approximation [B R02[ IHJMM04] . Unfortunately, as 
we will observe, this approach fails to verify properties of even very simple programs, if the 
properties relate to data stored in arrays. 

This paper introduces an approach to approximating the transition relation of a system 
using Craig interpolants derived from proofs of bounded model checking instances. These 
interpolants are formulas that capture the information about the transition relation of the 
system that was deduced in proving the property in a bounded sense. Thus, the transition 
relation approximation we obtain is tailored to the property we are trying to prove. More- 
over, it is a formula over only state-holding variables. Hence, for abstract models produced 
by predicate abstraction, the approximate transition relation is a purely propositional for- 
mula, even though the original transition relation is characterized by a first-order formula. 
Thus, we can apply well-developed Boolean image computation methods to the approx- 
imate system, eliminating the need for a decision procedure in the image computation. 
By iteratively refining the approximate transition relation we can guarantee convergence, 
in the sense that whenever the chosen predicates are adequate to prove the property, the 
approximate transition relation is eventually strong enough to prove the property. 

The reader should bear in mind that there are two kinds of abstraction occurring 
here. The first is predicate abstraction, which produces an abstract transition system whose 
state-holding variables are propositional. The second is transition relation approximation, 
which weakens the abstract transition formula, yielding a purely propositional approximate 
transition formula. To avoid confusion, we will always refer to the former as abstraction, and 
the latter as approximation. The techniques presented here produce an exact reachability 
result for the abstract model. However, we may still fail to prove unreachability if an 
inadequate set of predicates is chosen for the abstraction. 

After beginning with an overview of closely related work (Section [2]), we introduce some 
notation and definitions related to modelling infinite-state systems symbolically and briefly 
describe the method of deriving interpolants from proofs (Section [3]). Next, we introduce 
the basic method of transition relation approximation using interpolants (Section [4]). In 
the following section, we discuss a number of optimizations of this basic method that are 
particular to software verification, including a new method to strengthen interpolants that 
makes convergence more rapid (Section [7J). Finally, we present an experimental presents 
an experimental comparison of the interpolation method with the Das and Dill method 
(Section EJ. 

2. Related work 

The chief alternative to iterative approximation is to produce an exact propositional 
characterization of the abstract transition relation. For example the method of [LBC03] 
uses small-domain techniques to translate a first-order transition formula into a proposi- 
tional one that is equisatisfiable over the state-holding predicates. However, this translation 
introduces a large number of auxiliary Boolean variables, making it impractical to use BDD- 
based methods for image computation. Though SAT-base Boolean quantifier elimination 
methods can be used, the effect is still essentially to enumerate the states in the image. By 
contrast, the interpolation-based method produces an approximate transition relation with 
no auxiliary Boolean variables, allowing efficient use of BDD-based methods. 

The most closely related method is that of Das and Dill [DDOlj . This method analyzes 
abstract counterexamples (sequences of predicate states), refining the transition relation 
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approximation in such a way as to rule out infeasible transitions. This method is effective, 
but has the disadvantage that it uses a specific counterexample and does not consider 
the property being verified. Thus it can easily generate refinements not relevant to the 
property. The interpolation-based method does not use abstract counterexamples. Rather, 
it generates facts relevant to proving the given property in a bounded sense. Thus, it tends 
to generate more relevant refinements, and as a result converges more rapidly. 

In [HJMM04] , interpolants are used to choose new predicates to refine a predicate 
abstraction. Here, we use interpolants to refine an approximation of the abstract transition 
relation for a given set of predicates. 

3. Preliminaries 

Let S be a first-order signature, consisting of individual variables and uninterpreted n- 
ary functional and propositional constants. A state formula is a first-order formula over S, 
(which may include various interpreted symbols, such as = and +). We can think of a state 
formula (p as representing a set of states, namely, the set of first-order models of (p. We will 
express the proposition that an interpretation a over S models cp by <j>\cr\. 

We also assume a first-order signature S' , disjoint from S, and containing for every 
symbol s € S, a unique symbol s' of the same type. For any formula or term cp over S, we 
will use cp' to represent the result of replacing every occurrence of a symbol s in cp with s'. 
Similarly, for any interpretation a over S, we will denote by a' the interpretation over S' 
such that a's' = as. A transition formula is a first-order formula over S U S'. We think of 
a transition formula T as representing a set of state pairs, namely the set of pairs 0-2), 
such that o~\ U a' 2 models T. Will will express the proposition that a\ U a' 2 models T by 
T[ai,a 2 }- 

The strongest postcondition of a state formula cp with respect to transition formula T, 
denoted sp T ((/>), is the strongest proposition tp such that cp A T implies tp' . We will also 
refer to this as the image of cp with respect to T. Similarly, the weakest precondition of 
a state formula cp with respect to transition formula T, denoted wp T (cp) is the weakest 
proposition tp such that tp AT implies cp'. 

A transition system is a pair (I,T), where I is a state formula and T is a transition 
formula. Given a state formula tp, we will say that tp is k-reachable in (7, T) when there 
exists a sequence of states ao, ■ ■ ■ ,o~k, such that I[o~o] and for all < i < k, T[ai, cr^+i], and 
Further, ip is reachable in (I,T) if it is A:-reachable for some k. We will say that (p is 
an invariant of (I,T) when —xp is not reachable in (I,T). A state formula <p is an inductive 
invariant of (I,T) when I implies <p and sp T ((p) implies <p (note that an inductive invariant 
is trivially an invariant). 

3.1. Bounded model checking. The fact that tp is fc-reachable in (I, T) can be expressed 
symbolically. For any symbol s, and natural number i, we will use the notation to 
represent the symbol s with i primes added. Thus, s^ is s'". A symbol with i primes will 
be used to represent the value of that symbol at time i. We also extend this notation to 
formulas. Thus, the formula <p^ is the result of adding i primes to every uninterpreted 
symbol in (p. 

Now, assuming T is total, the state formula tp is fc-reachable in (I, T) exactly when this 
formula is consistent: 

j(0> A T {0) A . . . T (k-l) A ^(k) 
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We will refer to this as a bounded model checking formula [BCCZ99] . since by testing 
satisfiability of such formulas, we can determine the reachability of a given condition within 
a bounded number of steps. 

3.2. Interpolants from proofs. Given a pair of formulas (A,B), such that A A B is 
inconsistent, an interpolant for (A, B) is a formula A with the following properties: 

• A implies A, 

• A A B is unsatisfiable, and 

• yl refers only to the common symbols of A and B. 

Here, "symbols" excludes symbols such as A and = that are part of the logic itself. Craig 
showed that for first-order formulas, an interpolant always exists for inconsistent formu- 
las |Cra57] . Of more practical interest is that, for certain proof systems, an interpolant can 
be derived from a refutation of A A B in linear time. For example, a purely propositional 
refutation of A A B using the resolution rule can be translated to an interpolant in the form 
of a Boolean circuit having the same structure as the proof |Kra97l IPud97| . 

In [McM05| it is shown that linear-size interpolants can be derived from refutations in a 
first-order theory with uninterpreted function symbols and linear arithmetic. This transla- 
tion has the property that whenever A and B are quantifier- free, the derived interpolant A 
is also quantifier- freej] We will exploit this property in the sequel. 

Heuristically, the chief advantage of interpolants derived from refutations is that they 
capture the facts that the prover derived about A in showing that A is inconsistent with B. 
Thus, if the prover tends to ignore irrelevant facts and focus on relevant ones, we can think 
of interpolation as a way of filtering out irrelevant information from A. 

For the purposes of this paper, we must extend the notion of interpolant slightly. That 
is, given an indexed set of formulas A = {a\, . . . ,a n } such that f\A is inconsistent, a 
symmetric interpolant for A is an indexed set of formulas A = {di, . . . , a n } such that each 
di implies en, and f\A is inconsistent, and each di is over the symbols common to and 
A \ cii. We can construct a symmetric interpolant for A from a refutation of /\ A by simply 
letting en be the interpolant derived from the given refutation for the pair (dj, f\ A\a{). As 
long as all the individual interpolants are derived from the same proof, we are guaranteed 
that their conjunction is inconsistent. In the sequel, if A is a symmetric interpolant for A, 
and the elements of A are not explicitly indexed, we will use the notation A(ai) to refer to 
di. 

4. Transition relation approximation 

Because of the expense of image computation in symbolic model checking, it is often 
beneficial to abstract the transition relation before model checking, removing information 
that is not relevant to the property to be proved. Some examples of techniques for this 
purpose are |CGJ + 00llMA03j . 

In this paper, we introduce a method of approximating the transition relation using 
bounded model checking and symmetric interpolation. Given a transition system (I, T) 
and a state formula ip that we wish to prove unreachable, we will use interpolation to 



Note that the Craig theorem does not guarantee the existence of quantifier- free interpolants. In general 
this depends on the choice of interpreted symbols in the logic. 
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f <- True 
repeat 

if tp unreachable in (I,T), return "unreachable" 
else, if ip reachable in k steps in (/, T) 

A^-^W,^ ),...,^*- 1 ),^*)} 

if f\ A satisfiable, return "reachable in k steps" 

else 

A «- Itp(A) 

f<-f AAf=o(^( r<<> )) <-<> 

end repeat 

Figure 1: Interpolation-based transition approximation loop. Here, Itp is a function that 
computes a symmetric interpolant for a set of formulas. 

refine an approximation T of the transition relation T, such that T implies T. The initial 
approximation is just T = True. 

We begin the refinement loop by attempting to verify the unreachabilty of tp in the 
approximate system (/, T), using an appropriate model checking algorithm. If tp is found to 
be unreachable in (I, T), we know it is unreachable in the stronger system (/, T). Suppose, 
on the other hand that ip is found to be A;-reachable in (I,T). It may be that in fact ip is 
fe-reachable in (I,T), or it may be that T is simply too weak an approximation to refute 
this. To find out, we will use bounded model checking. 

That is, we construct the following set of formulas: 

A-U^,^ ),...,^- 1 ),^} 

Note that f\ A is exactly the bounded model checking formula that characterizes /c-reachability 
of ip in (J,T). We use a decision procedure to determine satisfiability of /\A. If it is sat- 
isfiable, ip is reachable and we are done. If not, we obtain from the decision procedure a 
refutation of f\ A. From this, we extract a symmetric interpolant A. Notice that for each i 
in 0. . . k — 1, A(T^) is a formula implied by , the transition formula shifted to time i. 
Let us shift these formulas back to time 0, thus converting them to transition formulas. 
That is, for i = . . . k — 1, let: 

Ti = (A(r<*> 

where we use (f>^~^ to denote removal of i primes from <f>, when feasible. We will call these 
formulas the transition interpolants. From the properties of symmetric interpolants, we 
know the bounded model checking formula 

/ Af <0> A---T^~ 1} AVfc 

is unsatisfiable. Thus we know that the conjunction of the transition interpolants /\- Ti 
admits no path of k steps from I to ip. We now compute a refined approximation T = T A 
/\^ Tj. This becomes our approximation T in the next iteration of the loop. This procedure 
is summarized in Figure [TJ Notice that at each iteration, the refined approximation T 
is strictly stronger than T, since T allows a counterexample of k steps, but T does not. 
Thus, for finite-state systems, the loop must terminate. This is simply because we cannot 
strengthen a formula with a finite number of models infinitely. 
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The approximate transition formula T has two principle advantages over T. First, it 
contains only facts about the transition relation that were derived by the prover in resolving 
the bounded model checking problem. Thus it is in some sense an abstraction of T relative 
to ip. Second, T contains only state-holding symbols. We will say that a symbol s € S is 
state-holding in (/, T) when s occurs in /, or s' occurs in T. In the bounded model checking 
formula, the only symbols in common between and the remainder of the formula are of 
the form S" or s^ l+1 \ where s is state-holding. Thus, the transition interpolants Tj contain 
only state-holding symbols and their primed versions. 

The elimination of the non-state-holding symbols by interpolation has two potential 
benefits. First, in hardware verification there are usually many non-state-holding symbols 
representing inputs of the system. These symbols contribute substantially to the cost of the 
image computation in symbolic model checking. Second, for this paper, the chief benefit is 
in the case when the state-holding symbols are all propositional (i.e., they are propositional 
constants). In this case, even if the transition relation T is a first-order formula, the 
approximation T is a propositional formula. The individual variables and function symbols 
are eliminated by interpolation. Thus we can apply well-developed Boolean methods for 
symbolic model checking to the approximate system. In the next section, we will apply this 
approach to predicate abstraction. 

5. Application to predicate abstraction 

Predicate abstraction [GS97j is a technique commonly used in software model checking 
in which the state of an infinite-state system is represented abstractly by the truth values 
of a chosen set of predicates P. The method computes the strongest inductive invariant of 
the system expressible as a Boolean combination of these predicates. 

Let us fix a concrete transition system (I, T) and a finite set of state formulas P that 
we will refer to simply as "the predicates" . We assume a finite set V C S of uninterpreted 
propositional symbols not occurring in / or T. The set V consists of a symbol v p for every 
predicate p 6 P. We will construct an abstract transition system (I, T) whose states are the 
minterms over V . To relate the abstract and concrete systems, we define a concretization 
function 7. Given a formula over V, 7 replaces every occurrence of a symbol v p with the 
corresponding predicate p. Thus, if is a Boolean combination over V, j((f>) is the same 
combination of the corresponding predicates in P. 

For the sake of simplicity, we assume that the initial condition I is a Boolean com- 
bination of the predicates. Thus we choose / so that = I- We define the abstract 
transition relation T such that, for any two minterms s, t € 2 V , we have T[s, t] exactly when 
7(s) A T A 7(i)' is consistent. In other words, there is a transition from abstract state s to 
abstract state t exactly when there is a transition from a concrete state satisfying 7(s) to a 
concrete state satisfying j(t). 

We can easily show by induction on the number of steps that if a formula tp over V 
is unreachable in (7, T) then "f{ip) is unreachable in (I, T) (though the converse does not 
hold). To allow us to check whether a given ip is in fact reachable in the abstract system, we 
can express the abstract transition relation symbolically [LBC03]. The abstract transition 
relation can be expressed as 

((A pe pK P)) ATA (A pe p(/ v'p))) I (VUV) 
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where Q j. W denotes the "hiding" of non-VF symbols in Q by renaming them to fresh sym- 
bols in S. Hiding the concrete symbols in this way takes the place of existential quantifica- 
tion. Notice that, under this definition, the state-holding symbols of (J, T) are exactly V . 
Moreover, for any two minterms s,t € 2 V , the formula s AT At' is consistent exactly when 
7(s) A T A jit) 1 is consistent. Thus, T characterizes exactly the transitions of our abstract 
system. 

To determine whether ip is reachable in this system using the standard "symbolic" 
approach, we would compute the reachable states R of the system as the limit of the 
following recurrence: 

Ro = I 

Ri+l = Ri V Spf(Ri) 

The difficulty here is to compute the image spj>. We cannot apply standard propositional 
methods for image computation, since the transition formula T is not propositional. We 
can compute spy (<f>) as the disjunction of all the minterms s € 2 V such that (f> A T A s' 
is consistent. However, this is quite expensive in practice, since it requires an exponential 
number of calls to a theorem prover. In |LBC03j . this is avoided by translating T into 
a propositional formula that is equisatisfiable with T over V U V'. This makes it possi- 
ble to use well developed Boolean image computation methods to compute the abstract 
strongest postcondition. Nonetheless, because the translation introduces a large number of 
free propositional variables, the standard approaches to image computation using Binary 
Decision Diagrams (BDD's) were found to be inefficient. Alternative methods based on 
enumerating the satisfiable assignments using a SAT solver were found to be more effective, 
at least for small numbers of predicates. However, this method is still essentially enumera- 
tive. Its primary advantage is that information learned by the solver during the generation 
of one satisfying assignment can be reused in the next iteration. 

Here, rather than attempting to compute images exactly in the abstract system, we will 
simply observe that state-holding symbols of the abstraction (I, T) are all propositional. 
Thus, the interpolation-based transition relation approximation method of the previous 
section reduces the transition relation to a purely propositional formula. Moreover, it does 
this without introducing extraneous Boolean variables. Thus, we can apply standard BDD- 
based model checking methods to the approximated system (I, T) without concern that non- 
state-holding Boolean variables will cause a combinatorial explosion. Finally, termination 
of the approximation loop is guaranteed because the abstract state space is finite. 

6. Software model checking 

In model checking sequential deterministic programs, we can make some significant 
optimizations in the above method. 

6.1. Path-based approximation. The first optimization is to treat the program counter 
explicitly, rather than modeling it as a symbolic variable. The main advantage of this is 
that it will allow us to apply bounded model checking only to particular program paths 
(i.e., sequences of program locations) rather than to the program as a whole. 

We will say that a program H is a pair (L, R), where L is a finite set of locations, and R 
is a finite set of operations. An operation is a triple (l,T, I') where T is a transition formula, 
/ £ L is the entry location of the statement, and I' 6 L is the exit location of the statement. 
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A path of program II from location Iq € L to location 6 L is a sequence 7r £ R k ~ l , of 
the form (Zo, 7b, Zi)(Zi,Ti, Z2) • • • (^fc-i)Tfe-i, Z^). We say that the path is feasible when there 
exists a sequence of states uq ■ ■ ■ such that, for all < i < k, we have Tj[<7j, dj+i]. The 
reachability problem is to determine whether program II has a feasible path from a given 
initial location Iq to a given final location If. 

As in the previous section, we assume a fixed set of predicates P, and a corresponding set 
of uninterpreted propositional symbols V. Using these, we construct an abstract program 
ft = (L,R). For any operation r = (l,T,l') } let the abstract operation f be (Z,T, Z'), where, 
as before 

T=((/\ peP (v P P)) ATA (A pe p(p' <))) I (VU10 

The abstract operation set R is then {f | r G i?}. We can easily show that if a path 
ro • • • is feasible, then the corresponding abstract path f • • • r£_i is also feasible. Thus 
if a given location If is unreachable from Iq in the abstract program, it is unreachable from Iq 
in the concrete program. 

Now we can apply the interpolation-based approximation approach to programs. We 
will build an approximate program II = (L, R), where R consists of an operation r = (I, T, I') 
for every f = (I, T, I') in R, such that T implies T, and T is over V U V . Initially every T 
is just True. 

At every step of the iteration, we use standard model checking methods to determine 
whether the approximation II has a feasible path from Iq to If. We can do this because 
the transition formulas T are all propositional. If there is no such path, then If is not 
reachable in the concrete program and we are done. Suppose on the other hand that there 
is such a path tt = ttq ■ ■ ■ %k-x- Let W = tt q • • • Ttk-i be the corresponding path of n. We 
can construct a bounded model checking formula to determine the feasibility of this path. 
Using the notation T(r) to denote the T component of an operation r, let 

A = {r(7fi)W I i € 0. . . & — 1} 

The conjunction f\ A is consistent exactly when the abstract path w is feasible. Thus, if 
/\ A is consistent, the abstraction does not prove unreachability of If and we are done. If 
it is inconsistent, we construct a symmetric interpolant A for A. We extract transition 
interpolants as follows: 

ft = (icrfa)^))^ 

Each of these is implied by the T(7fj), the transition formula of the corresponding ab- 
stract operation. We now strengthen our approximate program II using these transi- 
tion interpolants. That is, for each abstract operation f & R, the refined approximation 
is r = (l,T(r),l') where 

T(r) = T(f) A (A{T I fi = f, i E . . . k - 1}) 

In other words, we constrain each approximate operation r by the set of transition inter- 
polants for the occurrences of f in the abstract path 7f . The refined approximate program 
is thus (L, R), where R = {r \ f £ R}. From the interpolant properties, we can easily show 
that the refined approximate program does not admit a feasible path corresponding to tt. 

We continue in this manner until either the model checker determines that the ap- 
proximate program II has no feasible path from Iq to If, or until bounded model checking 
determines that the abstract program fl does have such a feasible path. This process must 
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a[x\ *- y 

y <- y + 1 
assume z = x 
assume a[z] ^ y — 1 



transition interpolant 



(a = z)' (a[z] = y)' 
(a[z\ =y^ (a[z] =y- 1)') A ((x = z)' -- 
(a[z) = y — 1 => (a[z] = y — 1)') A x = z 
a[z] ^y-l 



Figure 2: An infeasible program path, with transition interpolants. The statement "assume 
<f>" is a guard. It aborts when is false. In the transition interpolants, we have 
replaced v p with p for clarity, but in fact these formulas are over V U V'. 

terminate, since at each step II is strengthened, and we cannot strengthen a finite set of 
prepositional formulas infinitely. 

The advantage of this approach, relative to that of section 01 is that the bounded model 
checking formula /\ A only relates to a single program path. In practice, the refutation of a 
single path using a decision procedure is considerably less costly than the refutation of all 
possible paths of a given length. 

As an example of using interpolation to compute an approximate program, Figure [2] 
shows a small program with one path, which happens to be infeasible. The method 
of |HJMM 04j chooses the predicates x = z, a[z] = y and a[z] = y — 1 to represent the 
abstract state space. Next to each operation in the path is shown the transition inter- 
polant Tj that was obtained for that operation. Note that each transition interpolant is 
implied by the semantics of the corresponding statement, and that collectively the transi- 
tion interpolants rule out the program path (the reader might wish to verify this). Moreover, 
the transition interpolant for the first statement, a[x] <— y, is x = z =4> a[z] = y. This is 
a disjunction and therefore cannot be inferred by predicate image techniques that use the 
Cartesian or Boolean programs approximations. In fact, the BLAST model checker cannot 
rule out this program path. However, using transition interpolants, we obtain a transition 
relation approximation that proves the program has no feasible path from beginning to end. 



6.2. Modeling with weakest precondition. A further optimization that we can use in 
the case of deterministic programs is that we can express the abstract transition formulas 
T in terms of the weakest precondition operator. That is, if T is deterministic, the abstract 
transition formula T is satisfiability equivalent over V LiV to: 

(A pe pK P)) A -wp T (FALSE) A (A p6 pK wp T (p))) 

Thus, if we can symbolically compute the weakest precondition operator for the operations in 
our programming language, we can use this formula in place of T as the abstract transition 
formula. In this way, the abstract transition formula is localized to just those program 
variables that are related in some way to predicates P. In particular, if tt is an assignment 
to a program variable not occurring in P, then we will have v' p <^=^ p, for every predicate 
in P. 

6.3. A hybrid approach. We can combine transition interpolants with other methods of 
approximating the transition relation or the image. For example, given a set of proposi- 
tions V, the strongest Cartesian postcondition scp T (0) of a formula <p with respect to a 
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transition formula T is the strongest cube tp over V such that <j> A T implies ip' (a cube 
is a conjunction of literals). In computing the image of a state formula <j) with respect to 
an operation f of the approximate program, we can strengthen the result by conjoining 
it with the strongest Cartesian postcondition with respect to the corresponding abstract 
operation r. Thus, the hybrid image of 4> with respect to transition f is: 

hirO) = sp T(f) 0) A scp T(f) 0) 

This set is still an over-approximation of the exact abstract image spy( f )(</>), so it is sound 
to use the hybrid image in the reachability computation. This may result in fewer iterations 
of the refinement loop. 

7. Computing strong interpolants 

In preliminary tests of the method, we found that transition interpolants derived from 
proofs by the method of [McM03] were often unnecessarily weak. For example, we might 
obtain (pAq) =>• (p' Aq') when the stronger (p =>■ p')A(q => q') could be proved. This slowed 
convergence substantially. The experiments presented in this paper use a modified version 
of the method of |McM03j which incorporates technique of strengthening the interpolant 
obtained from a resolution proof . It is important in practice to compute strong transition 
interpolants, to reduce the number of refinement iterations needed to compute the transiton 
relation approximation. 

As an example of this, notice that in Figure El the transition interpolant for the second 
step is a conjunction of disjunctions: 

[a[z] = y =>• (a[z] = y— 1)') A ((x = z)' => x = z) 

However, other valid interpolants are possible. For example, we might have obtained a 
weaker version: 

(x = z)' =>- (x = z A (a[z] = y^ (a[z] = y- 1)')) 
This formula has been weakened by pulling one disjunction outside of the conjunction, 
though it is still sufficient to rule out this particular program path. The stronger interpolant 
has the advantage that it may be more useful in ruling out other program paths in a more 
complex program. Unfortunately, either of these interpolants might be obtained in practice, 
depending on the exact order of resolution steps generated by the prover. The order of 
resolution steps generated by a SAT solver depends on the order in which implications are 
propagated by the Boolean constraint propagation (BCP) procedure, and is quite arbitrary. 
Thus, it is useful in practice to try to adjust the proof before computing an interpolant, in 
such a way that a stronger interpolant results. 

To understand this process in detail, it is necessary to understand the process of gener- 
ating interpolants from resolution proofs, as described in [McM03]. A full treatment of this 
subject is beyond the scope of this paper. However, to gain some intuition about the prob- 
lem, it is only necessary to know two things about such interpolants. First, the interpolant 
for (A, B) is a Boolean circuit whose structure mirrors the structure of the resolution proof 
that refutes A A B. Second, resolutions on local atoms (those not occurring in B) gener- 
ate "or" gates, while resolutions on global atoms (those occurring in B) generate "and" 
gates. Thus, if we want to generate a strong interpolant formula, it would be best to move 
the local resolutions toward the hypotheses of the proof, and the global resolutions toward 
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the conclusion. This effectively moves the "or" gates toward the inputs of the interpolant 
circuit, and the "and" gates toward the output, thus strengthening the interpolant. 

We will think of a refutation proof by resolution as a DAG (V, E) , in which the vertices V 
are clauses. Each root of the DAG is a hypothesis of the proof, and the unique leaf is the 
empty clause (representing "false"). Each non-root vertex v has exactly two parents, which 
we will denote v + and v~ , and a pivot variable Pivot(u). The proof is valid when, for every 
non-root vertex v, v + has the form Pivot(u) V Oi and v~ has the form -iPivot(v) V O2 
and v = Oi V O2 (that is, each derived clause is the result of resolving its two parents on 
variable Pivot(v)). Figuratively speaking, each literal in a hypothesis flows down the DAG 
until it is annihilated by resolution with its negation. As an example, Figure [3h. shows a 
simple resolution refutation whose hypotheses are pV q, —>p and —>q. In this case we resolve 
first on p, then on q. 

pVq ->p pVq ->q 




q ^q p ^p 




-1 _L 



(a) (b) 

Figure 3: Exchanging the order of two resolution steps 

The order of two consecutive resolution steps in a proof can always be reversed, though 
possibly at the cost of expanding the proof. For example, Figure [3b shows the result of 
reversing the order of resolution in our simple example, so that we resolve first on q, then 
on p. This is an example of a generally valid graph transformation on resultion proofs, 
depicted in Figure [H In the figure, a box containing a variable p denotes the result of 
resolving its parents on p. The left parent of a vertex v is v + , while the right parent is v~ . 
This transformation is valid when q occurs in v\, but not in V2- The reader can easily verify 
that the right graph is a valid proof when the left one is. Note that the vertex marked * 
on the left hand side may have other successors in the graph that are not pictured. In this 
case we cannot delete this vertex from the proof when applying the transformation. Thus, 
applying the transformation may increase the size of the proof graph by one vertex. 



v.. 



p 



q 



p 



Figure 4: Proof transformation valid when v\ contains q, but i>2 does not. 
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Figure [5] shows the case when both the clauses v± and V2 contain q. In this case, we 
obtain two resolutions on q. Again, the original resolution on p, marked *, may or may not 
be deleted, depending on whether it has additional successors. Both these transformation 
are symmetric with respect to polarity. Thus, we obtain similar transformations by reversing 
the antecedents of either resolution step on the left-hand side. 




Figure 5: Proof transformation valid when both v\ and V2 contain q. 



By applying these graph transformations systematically, we can in principle move all 
of the resolutions on local atoms to the top of the proof, and all the resolutions on global 
atoms to the bottom. This would result in an interpolant in conjunctive normal form (CNF). 
However, it may also result in an exponential expansion of the proof. Instead, we will take 
a limited approach that keeps the interpolant linear in the size of the original resolution 
proof, but may not yield an interpolant in CNF. 

First, we must first take into account that the proof is a DAG and not a tree. As noted 
above, raising resolution on q above a resolution on p could result in the loss of shared 
structure, if the latter has more than one successor in the graph. To prevent this, we never 
apply the reversal transformations when the upper resolution (on p) has multiple successors. 
We will say a vertex is passable if it is not a root, and if it has exactly one successor. We 
traverse the proof in some topological order, from antecedents to consequents. Each time we 
encounter a resolution step v on a local atom q, we raise this resolution step by repeatedly 
exchanging it with one of its parents, until neither of its parents is passable. This procedure 
is shown in pseudocode in Figure El 

Note that the procedure Raise, if q occurs in both v' and v'~ , then we create two 
new resolutions on q, while removing only one (this corresponds to the transformation of 
Figure [5]). Thus, the size of the proof increases. However, the final number of resolutions 
on q is no more than the number of occurrences of q in the the original proof. Thus, the 
number of resolutions we obtain after raising all the resolutions on local atoms is linear in 
the size of the original proof (if we measure it by the number of literals it contains). As a 
result the interpolant we obtain from the rewritten proof is still linear in size of the original 
proof (though it may be quadratically larger than the interpolant derived from the original 
proof). 

Also note that in procedure Raise, it may be possible to raise a given resolution v over 
either v + or v~ . We have arbitrarily chosen v + in this case, though it may be, for example, 
that the heavier of the two proof branches would be heuristically the better choice. This 
choice occurs rarely, however, in proofs generated by SAT solvers. These proofs tend to 
consist of long chains of resolutions in which one of the two antecedents are hypotheses. 
These chains are the result of Boolean constraint propagation. The unusual case in which 
both antecedents are derived clauses are typically the result of the SAT solver backtracking 
out of a decision. Thus in most cases, only one choice is possible. 
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function RESOLVE^, vx, V2) 

if q does not occur in v\ then return V2 

else if —>q does not occur in V2 then return v\ 

else 

let r be the resolvent of v\ and v 2 on q 
add r to the proof, with r + = v\ and r~ = V2 
Raise (r) 
return r 

procedure Raise(u) 
let q = Pivot(u) 
if v + is passable then 

let v' = v + and p = Pivot(v') 

set v + <— RESOLVE(g,v' + ,t;~) 

set u~ <— RESOLVE(g,t/~,u~) 

remove vertex and set PivOTfv) <— p 
else if f ~ is passable then 

let v' = v~ and p = Pivot(v') 

set v + <— RESOLVE(g, 

set v~ <— Resolve((7, u /— ) 

remove vertex u' and set Pivot(v) <— p 

procedure TransformProof 

let ci, . . . , Cn be a topological sort of the non-root proof vertices 
for i = 1 ... n do 

if Pivot(cj) is local (occurs only in A) then Raise(cj) 

Figure 6: Proof transformation procedure 

There are two reasons why, after apply the transformation procedure, we may still have 
global resolutions above local resolutions (and thus "and" gates inside "or" gates in the 
interpolant). Most obviously, the proof may not have been a tree, and thus raising some local 
resolution may have been blocked because neither parent was passable. The other reason is 
that when we raise resolution on q above one antecedent, we also raise the proof of the other 
antecedent. This may itself contain global resolutions (though as mentioned, in most cases 
the other antecedent is a hypothesis) We might imagine continuing by raising each resulting 
resolution on q above its other antecedent. However, the resulting loss of structure sharing 
would cause an exponential expansion in the proof DAG. In practice, we have found that the 
limited transformation procedure outlined figure [6] resutls in an interpolant in CNF most of 
the time, producing a substantial improvement in the performance of interpolation-based 
refinement over the basic procedure of [McM03]. 

8. Experiments 

We now experimentally compare the method of the previous section with a method 
due to Das and Dill [DD01] . This method refines an approximate transition relation by 
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analyzing counterexamples from the approximate system to infer a refinement that rules 
out each counterexample. More precisely, a counterexample of the approximate program 
(L,R) is an alternating sequence n = aofoai ■ ■ -rk-io~k, where each Oi is a minterm over 
V, each fj is an operation in R, l(ro) = lo, l'(rk-i) = If, and for all < i < k, we have 
T(fi)[o~i, Cj+i]. This induces a set of transition minterms, ti = o~i A for < i < k. 

Note that each ti is by definition consistent with T{fj). 

To refine the approximate program, we test each ti for consistency with the correspond- 
ing abstract transition formula T(fi). If it is inconsistent, the counterexample is false (due 
to over-approximation). Using an incremental decision procedure, we then greedily remove 
literals from ti that can be removed while retaining inconsistency with T{fi). The result is 
a minimal (but not minimum) cube that is inconsistent with T{fi). The negation of this 
cube is implied by T(fi), so we use it to strengthen corresponding approximate transition 
formula T(fi). Since more than one transition minterm may be inconsistent, we may refine 
several approximate operations in this way (however if none are inconsistent, we have found 
a true counterexample of the abstraction). 

Both approximation refinement procedures are embedded as subroutines of the Blast 
software model checker. Whenever the model checker finds a path from an initial state to 
a failure state in the approximate program, it calls the refinement procedure. If refinement 
fails because the abstraction does not prove the property, the procedure of [HJMM04J is 
used to add predicates to the abstraction. Since both refinement methods are embedded in 
the same model checking procedure and use the same decision procedure, we can obtain a 
fairly direct comparison. 

Our benchmarks are a set of C programs with assertions embedded to test properties 
relating to the contents of arrays Some of these programs were written expressly as tests. 
Others were obtained by adding assertions to a sample device driver for the Linux operating 
system from a textbook |RC01| . Most of the properties are true. None of the properties 
can be verified or refuted by Blast without using a refinement procedure, due to its use of 
the Cartesian image. 

Figure [7] shows a comparison in terms of run time (on a 3GHz Intel Xeon processor) and 
number of refinement steps. The latter includes refinement steps that fail, causing predi- 
cates to be added. Run time includes model checking, refinement, and predicate selection. 
Each point represents a single benchmark problem. The X axis represents the Das/Dill 
method and the Y axis the interpolation-based method. Points below the heavy diagonal 
represent wins for the interpolation method, while points below the light diagonal represent 
improvements of an order of magnitude (note in one case a run-time improvement of two 
orders of magnitude is obtained). Figure [8] shows the same comparison with the hybrid 
image computation. Here, the reduction in number of refinement steps is less pronounced, 
since less information must be learned by refinement. 

The lower number of refinement steps required by interpolation method is easily ex- 
plained. The Das/Dill method uses a specific counterexample and does not consider the 
property being verified. Thus it can easily generate refinements not relevant to proving the 
property. The interpolation procedure considers only the program path, and generates facts 
relevant to proving the property for that path. Thus, it tends to generate more relevant 
refinements, and as a result it converges in fewer refinements. 



Available at http : //www-cad. eecs .berkeley . edu/~kenmcmil/cav05data. tar .gz 
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Figure 7: Comparison of the Das/Dill and interpolation-based methods as to run time and 
number of refinement steps. 
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Figure 8: Comparison of the Das/Dill and interpolation-based refinement methods, using 
the hybrid image. 



Figure [9] compares the performance the interpolation-based method with and without 
hybrid image computation. Though the hybrid method can reduce the number of refinement 
steps, it sometimes increases the run time due to the cost of computing the Cartesian image 
using a decision procedure. 

9. Conclusions 

We have described a method that combines bounded model checking and interpolation 
to approximate the transition relation of a system with respect to a given safety property. 
The method is extensible to liveness properties of finite-state systems, in the same manner 
as the method of |MA03| . When used with predicate abstraction, the method eliminates 
the individual variables and function symbols from the approximate transition formula, 
leaving it in a propositional form. Unlike the method of [LBC03J . it does this without 
introducing extraneous Boolean variables. Thus, we can apply standard symbolic model 
checking methods to the approximate system. 

For a set of benchmark programs, the method was found to converge more rapidly than 
the counterexample-based method of Das and Dill, primarily due to the prover's ability 
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Figure 9: Comparison of the interpolation-based refinement methods, without and with 
hybrid image. 



to focus the proof, and therefore the refinements, on facts relevant to the property. The 
benchmark programs used here are small (the largest being a sample device driver from a 
textbook), and the benchmark set contains only 19 problems. Thus we cannot draw broad 
conclusions about the applicability of the method. However, the experiments do show a 
potential to speed the convergence of transition relation refinement for real programs. Our 
hope is that this will make it easier to model check data-oriented rather than control-oriented 
properties of software. 
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